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SUMMARY  PAGE 


THE  PROBLEM 

Significant  progress  has  been  made  in  the  development  of  automated 
speech  understanding  systems  for  application  to  naval  aviation  systems. 

One  advantage  that  is  anticipated  for  speech  over  con'^entional  man -machine 
Interfaces  is  that  speech  could  function  as  an  independent  channel  for  the 
control  of  systems.  The  experiment  reported  in  this  paper  represents  a 
preliminary  investigation  of  the  assumption  that  an  automatic  speech  syn- 
thesising and  recognition  system  can  provide  the  human  operator  an  addi- 
tional end  parallel  channel  for  processing  information  and  effecting  control 
responses . 

The  experiment  required  human  subjects  to  timeshare  a digital  infor- 
mation processing  task  and  a continuous  compensatory  tracking  task . 
Independent  variables  in  the  design  were  task  loading  (single-  vs.  dual- 
task co'.iditlons) , stimulus  presentation  modality  for  the  digital  task  (audi- 
tory vr. . visual) , and  response  modality  for  the  digital  task  (voice  vs.  key- 
board) . Data  from  16  subjects  were  analysed. 

FINDINGS 

The  results  indicated  that  the  combination  of  visual  stimulus  modality 
and  voice  response  provided  optimum  Joint-task  performance . No  combir.ation 
of  stimulus  and  response  modalities  resulted  in  equivalent  single-  and  dual- 
task  performance.  Future  experiments  should  be  designed  to  investigate  the 
joint-task  performance  space  for  tasks  that  are  more  representative  of  the 
informs  don  processing  performance  requiremmts  of  specific  systems . 
However,  the  interpretabllity  of  tho  results  of  such  research  will  depend  upen 
the  solution  of  methodological  problems,  such  as  how  to  control  or  account 
for  subjects'  speed -accuracy  tradeoff  strategies  and  the  prisrilies  they  place 
upon  the  concurrent  tasks . 


INTRODUCTION 


Recent  advances  in  artificial  intelligence  technology  have  resulted  in 
commercially  availablf'  computer  ayatems  that  are  able  to  synthesize  auditory 
messages  and  to  recogtuse  spoken  wo:rds  and  phrases  in  near -real -time  with 
a high  level  of  reliability.  Several  researchers  have  enumerated  the  benefits 
that  are  expected  to  accrue  from  computer  recognition  of  speech  (C  9) . Beek, 
Neuberg,  and  Hodge  (1)  summarized  the  possible  applications  of  this  new  tech- 
nology to  military  systems.  Curran  (4) . and  Coler,  Plummer,  Huff,  and  Hitch- 
cock (3)  reported  significant  progress  in  the  development  of  automated  speech 
understanding  syatems  for  the  control  of  on-board  systems  in  military  aircraft. 

Among  the  anticipated  advantages  fur  automatic  speech  recognition  over 
conventional  techniques  for  effecting  man -machine  communication  is  that  speech 
should  fonctlon  as  an  additional.  Independent  channel  for  the  control  of  systems 
(12) . It  is  assumed  that  even  though  the  operator's  eyes  and  hands  may  be 
heavily  occupied  in  the  performance  of  a task . in  many  Instances  he  would  have 
adequate  realdual  processing  capacity  to  perform  another  taak  if  the  information 
to  be  processed  could  be  presented  aurally  and  responses  could  be  made  vocally . 
The  results  of  recent  research  are  equivocal  on  this  point  (5) . The  most  frequent 
finding  has  been  that  the  performance  of  one  or  both  tasks  will  be  degraded  when 
a visual/manual  task  is  performed  concurrently  with  an  auditory/vocal  taak.  The 
experiment  reported  in  this  paper  represents  an  initial  effort  to  investigate  the 
assumption  that  an  automatic  speech  synthesis  and  recognition  system  can  provide 
the  human  operator  an  additional . parallel  channel  for  processing  information  and 
affecting  control  responses . A central  focus  of  the  present  study  was  to 
determine  not  only  if  performance  capabilities  will  be  enhanced , but  also  the 
nature  and  extant  of  oombined-task  performance  tradeoffs  when  audition  and 
vocalisation  are  used  as  alternativss  to  visual  input  and  manual  output  modalities . 

In  two  recent  theoretical  papers  Norman  and  Bobrow  (10,  11)  introduced 
the  concept  of  a system  of  limited  processing  resources  to  account  for  the  limits 
in  human  information  processing . Examples  of  resources  are  effort , memory 
capacity,  and  information  channels.  When  two  concurrent  procosses  require 
access  to  the  same  resource,  that  resource  must  be  allocated  between  them . 
Performance  of  one  or  both  of  the  competing  processes  will  deteriorate  when 
the  amount  of  the  resource  rjquired  by  both  pr^  asses  exceeds  the  limit  avail- 
able to  the  system.  To  exan  ine  the  tradeoff  thac  occurs  when  two  tasks  are 
performed  concurrently , No.'  man  and  Bobrow  proposed  the  use  of  a performance 
operating  characteristics  (POC) . The  POC  is  s plot  cf  performance  on  one  task 
as  a function  of  conjoint  performance  on  another  task , and  is  generated  by 
varying  resource  allocation  between  two  time-shared  tasks . As  Norman  and 
Bobrow  pointed  out,  the  interpretation  of  a POC  depends  upon  the  assumption 
of  complete  complementarity  of  processing  resources  required  for  the  competing 
tasks.  Navon  and  Gopher  (7)  noted  that  complementarity  is  only  one  of  several 
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ways  In  which  retourceii  may  be  shared  between  two  complex  tasks . Navin  and 
Gopher  (7)  noted  that  complementarity  ia  only  one  of  several  ways  in  which 
resources  may  be  shared  between  two  complex  tasks . Navon  and  Gopher 
showed  that  interpretation  of  an  empirical  POC  requires  considerable  knowledge 
of  the  specific  reaources  required  for  each  of  the  competing  tasks . 

To  summariie  the  arguments  of  Norman  and  Bobrow.  Navon  and  Gopher, 
and  othera  (aea  Kantowiti  and  Knight,  8) . two  parameters  must  be  considered 
when  tasks  are  performed  ooncurrently:  the  relative  priorities  between  the 
tasks,  and  the  specific  resources  required  by  the  tasks.  In  the  present  experi-* 
ment  relative  priorities  between  two  time«ehared  tasks  were  held  oonstant , and 
the  input  and  output  (I/O)  channels  for  one  of  the  tasks  ware  varied . The  two 
taaks  chosen  fot  the  experiment  included  a continuous  compensatory  tracking 
task  and  a digital  information  processing  task . The  independent  variablea 
were:  task  loading  (single-  vs.  dual -task  conditions);  stimulus  preaentation 
mode  for  the  dig  task  (visual  vs.  auditory):  and  response  mode  for  the  digit 
task  (vocal  or  manual) . 


PROCEDURE 


SUBJECTS 

Twenty  male  naval  officers  and  civilian  staff  members  participated  as  sub- 
jects in  the  experiment . All  subjects  were  right-handed  and  were  between  the 
ages  of  23  and  38  years . 

EXPERIMENTAL  DESIGN 

Subjects  were  tested  in  single-  and  dual -task  performance  of  both  a 
one  -dimensional  compensatory  tracking  task  and  a continuous  absolute  difference 
digit-processing  task . As  mentioned  above , the  three  independent  variables 
were  task  loading  and  stimulus  and  response  modalities  in  the  absolute  difference 
task . Figure  1 shows  the  eight  experimental  conditions  in  the  design . The 
stimulus  presentation  modality  for  the  absolute  difference  task  represented  a 
between-subject  variable  with  ten  subjects  serving  in  each  condition . Task 
loading  and  response  modality  were  within-subject  variables.  The  various 
experimental  trials  are  presented  in  Table  I.  The  order  of  trials  ITl , 1T2,  IV , 
and  IK  were  oounterbalanced  across  the  ten  subjects,  as  were  trials  2V  and  2K. 
Subjects  were  tested  in  the  same  conditions  on  each  of  the  two  successive  days . 
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Table  I 

Experimental  Trial  Sequence 


TRIAL 


DESCRIPTION 


ITl  Slngla-Taak  Tracking  (3  minutes) 

1T3  Singls-Taak  Tracking  (3  minutes) 

IV  Single-Task  Subtraction.  Vocal  Reaponaea  (SO  trials) 

IK  Single-Task  Subtraction,  Keyboard  Response  (50  trials) 

2V  Dual-Task  with  Vocal  Responses  (3  minutes) 

2K  Dual-Task  with  Keyboard  Responses  (3  minutes) 


APPARATUS 

The  experiment  employed  one  subject  booth  of  the  Multipurpose  Automated 
Research  Test  Station  (MARTS)  system  illustrated  in  Figure  2.  Stimulus 
sequences  wore  controlled  by  a Data  Qeneral  Corporation  NOVA  BOO  minicom- 
puter with  32K  X 16  core  memory . The  computer  console  was  used  by  the 
axparimentet  for  input  of  experimental  conditions  and  for  display  of  performance 
statlatica  at  the  and  of  each  trial.  The  line  printer,  a Versatek  Matrix  electro- 
static printer -plotter , provided  output  of  more  complete  tables  and  graphs  of 
subject  performance  at  the  end  of  each  test  session.  On-line  storage  of  data  was 
acoompiished  by  means  of  the  magnetic  disk . The  analog -to-digital  (A/D) 
converter  and  standard  multi-line  asynchronous  data  multiplexor  (MPX)  con- 
verted voltage  signals  from  the  joystick,  and  accepted  codes  from  the  keybon*’d, 
respectively.  A custom-built  interface  (the  MODS  in  Figure  2)  received  ana 
decoded  switch  closures  from  the  keyboard  and  transmitted  codes  to  the  NOVA 
800  MPZ  device . 

The  Megatek  Corporation  Megagraphics  6000  system  used  to  display  track- 
ing and  diglt-procesaing  tasks  to  the  subjects  is  a random  stroke-drawn 
ciithode-ray  tube  (CRT)  dieplay  system  capable  of  presenting  alphanumerlcs 
and  other  line-drawn  shapes.  Stimuli  were  presented  on  a Hewlett-Packard 
model  1310A  CRT  oscilloscope. 

Tha  keyboard,  configured  with  microswitches,  was  positioned  on  the 
left  side  of  the  testing  booth  and  arranged  in  two  rows  of  four  buttons  each: 

0 - 1 - 2 - 3 (bottom  row) , and  4 - 5 - 6 - 7 (top  row) . Switch  travel  was 
approximately  1 ram  before  contact.  The  two-row  arrangement  was  selected  to 
provide  rapid  learning  of  the  keyboard  and  to  decrease  the  requirements  to  shift 
visual  attention  from  the  CRT . 
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Voice  recognition  and  synthesis  functions  were  performed  by  a Scope 
Electronics  Voice  Data  Entry  Terminal  System  (VDETS)*,  consisting  of  a 
Data  General  Corporation  NOVA  2/10  with  16K  of  16  core  memory,  a Scope  user's 
station,  a voice  synthesizer,  and  an  ASR-33  Teletype.  The  NOVA  2/10  was 
linked  to  the  NOVA  800  host  computer  via  a duplex  MPX  channel . The  Scope 
user's  station  converts  voice  an^og  signals  from  a microphone  mounted  on  the 
subject's  headset  to  digital  format  for  entry  into  the  NOVA  2/10.  A Vocal  Inter- 
face Division  model  VS -6  Votrax  (VTX)  voice  synthesis  unit  provided  auditory 
output  signals  to  the  subject's  headset  in  the  testing  booth.  The  Teletype  was 
used  by  the  experimenter  to  control  and  monitor  the  VDETS  utterance  recogni- 
tion performance . 

SINGLE -TASK  PROCEDURES , TRACKING 

The  subjects  performed  a one-dimension  compensatory  tracking  task 
requiring  appropriate  left-right  movements  of  a Joystick  control  to  maintain  the 
position  of  a diamond-shaped  cursor  in  the  center  of  the  9 cm -long  horizontal 
track  (see  Figure  3) . The  disturbance  forcing  function  input  consisted  of  the 
sum  of  three  nonharmonically  related  sinusoidal  waveforms.  The  Joystick  was 
a Measurement  Systems,  Inc. , model  526  spring -centered  finger  control  with 
lateral  deflection  range  of  ^ 30  degrees , a break-out  force  of  170  gm , and  a full- 
deflection  actuating  force  of  283. S gm. 

Subjects  tracked  for  two  3-minute  trials  with  a 2-minute  rest  period 
intervening . The  Joystick  initially  act  jd  as  a pure  velocity  controller . 

Task  difficulty  was  adaptively  increa'ied  by  adjusting  the  ratio  of  acceleration- 
to-velocity  components  in  the  stick  control  dynamics.  When  the  subject  main- 
tained less  than  20  percent  of  scale  error , the  percentage  of  acceleration 
gradually  Increased  in  0.05-percent  steps  every  50  msec.  Acceleration  was 
decreased  in  the  sam ) manner  whenever  the  subject  was  outside  the  adaptive 
criterion . The  difficulty  of  the  tracking  task  was  manipulated  in  this  manner 
in  an  attempt  to  reduce  the  effects  of  individual  differences  in  tracking  skill  on 
the  dependent  measures  of  tracking  performance.  TLs  percent  acceleration 
variable  was  successful  in  manipulating  tracking  difficulty  in  previous  studies 
(2)  . 


The  tcsk  remained  adaptive  for  the  first  four  minutes  of  performance  (the 
entire  first  trial  plus  1 minute  of  the  second  trial)  and  remained  at  the  attained 
percent  acceleration  for  the  final  2 minutes  of  the  second  trial . a digital 
approximation  to  Root  Mean  Square  Error  (RMSE)  was  computed  over  10-second 
intervals  for  the  final  2 minutes  of  single-task  performance , and  the  mean  and 
standard  deviations  of  these  values  were  computed  to  represent  the  subject's 
single-task  tracking  performance . Time  on  target  (TOT)  was  also  computed 
for  this  interval . 


*VDETS  is  now  marketed  by  Interstate  Electronics . 
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A continuous  visual  performance  feedback  indicator  was  presented  to  the 
subject  throughout  single-task  performance  in  the  form  of  a vertically  moving 
bar  graph  (see  Figure  3) . The  momentary  height  of  the  graph,  updated  each 
second,  corresponded  to  tracking  TOT  computed  over  the  immediately  preceding 
10-second  intarval;  the  higher  the  indicator , the  better  the  performance. 

A small  rectangular  box  indicated  a desired  performance  level  which  the  subject 
was  instructed  to  reach  or  exceed . This  level  represented  50  percent  TOT , 
which  corresponded  to  the  adaptive  criterion  (20  percent  of  scale) . The  maxi- 
mum height  of  the  performance  indicator  corresponded  to  a 100  percent  TOT 
score . In  single-task  conditions  the  underlying  scale  of  the  feedback  graph 
was  linear . 

SINGLE-TASK  PROCEDURES . DIGIT  PROCESSING 

The  digit-processing  tssk  required  subjects  to  compute  the  absolute 
difference  between  two  successive  digits  in  a pseudo-random  sequence . 

Stimulus  digits  varied  between  0 and  7 . Responses  fell  within  the  same  range . 
The  task  was  subject-paced . As  soon  as  the  subject  responded  with  the  absolute 
value  of  the  difference  between  the  current  digit  and  the  previous  digit  in  the 
sequence , a new  digit  was  presented . An  example  of  a t^i^ical  presentation 
sequence  and  associated  responses  is  given  below; 

Stimulus  sequence:  7-4-2-7-3-5-2-0.  . . 

Subject  responses:  3-2-S-4-2-3-2.  . . 

Subjects  were  tested  in  two  response  conditions;  1)  vocal  (VDETS); 

2)  manual  (keyboard) . The  order  of  response  conditions  was  counterbalanced 
across  subjects.  In  both  conditions  a new  stimulus  digit  was  presented  only 
after  a correct  response . A single-task  session  consisted  of  50  trials . 

In  the  event  the  subject  forgot  the  previous  stimulus  digit,  he  could 
request  that  it  be  repeated  by  either  pressing  a designated  key  in  the  keyboard 
condition,  or  by  saying  "again"  in  the  vocal  response  condition.  Also,  in  the 
vocal  response  condition  if  the  recognition  system  failed  to  understand  the 
subject's  response,  he  was  notified  through  the  Vctrac  unit,  which  responded 
with  the  phrase  "Say  again . " In  this  instance . subjects  were  instructed  to 
repeat  their  response . Average  response  time  on  correct  trials,  average 
response  time  for  trials  containing  errors . the  number  of  errors . the  number 
of  trials  with  requests  for  repeated  stimuli , and  the  number  of  trials  with 
recognition  failures  were  recorded  for  each  session . 

A vertically  moving  feedback  bar  graph  was  used  during  digit-task 
performance . The  momentary  height  corresponded  to  the  average  time  between 
correct  responses  for  the  preceding  ten  trials . The  desired  level  box  initially 
represented  a .'^.O  second  average  correct  response  interval  and  the  full  range 
of  the  bar  graph  extended  from  4.5  to  1 . S seconds  from  bottom  to  top , 
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respectively.  When  performance  improved  beyond  3.C  seconds,  the  desired 
level  became  the  current  best  average.  Thus,  the  criterion  for  good  perfor- 
mance was  continually  changed  to  represent  maximum  performance  during  the 
50-trial  sequence . The  underlying  scale  for  feedback  in  the  digit-processing 
task  was  linear  in  the  single -task  condition . 

DUAL -TASK  PROCEDURES 

After  performing  the  tasks  individually . subjects  performed  both  tasks 
together  for  two  3-minute  trials . The  order  of  response  modality  conditions 
for  the  digit  task  was  again  counterbalanced . The  tracking  task  difficulty  was 
fixed  at  the  atuined  level  of  acceleration  control  achieved  in  the  adaptive  portion 
of  single-task  sessions.  Performance  feedback  indicators  were  again  used  for 
each  taak;  however , the  desired  performance  region  reprecented  the  mean 
single-task  performsnces  of  the  two  tasks  (see  Figure  3) . For  tracking  this 
was  the  mean  RMSE  percent  of  scale  from  the  final  2 minutes  of  single-task 
performance.  For  digit-processing  the  goal  represented  the  mean  correct 
responae  latency  for  the  final  30  trials  in  single-task  performance.  Thus  the 
subject  was  given  continuous  momentary  performance  indications,  representing 
the  difference  between  current  dual-task  performance  and  the  mean  of  his 
single-task  performance . Subjects  were  instructed  to  attempt  to  reach  or 
exceed  these  goal  lines  during  the  session . and  that  the  tasks  were  of  equal 
priority . The  actual  levels  that  the  goals  represented  were  not  revealed  to 
the  subjects  in  the  instructions.  The  first  minute  of  dual -task  performance  was 
excluded  from  computation  of  performance  measures  to  reduce  warm-up  effects . 

The  movement  of  the  performance  feedback  indicators  in  dual -task  con- 
ditions was  individualised  for  each  subject . based  on  his  mean  and  standard 
deviation  from  aingle-task  performance . The  height  of  the  indicator  represented 
the  difference  between  single-task  performance  and  the  current  moirentary 
dual-task  performance  measures  in  standard  score  units . The  formula  for 
this  calculation  was: 


Standard  Score  = ^ st  ~ ^ dt 

St 


where  represented  the  single-task  means;  represented  the  momentary 
dual-task  performance  computed  over  the  previous  10  seconds  of  tracking,  or 
ten  digit  responses;  and  s^,t  was  the  standard  deviation  of  the  performance 
distribution  in  single-task  performance.  This  standard  score  was  then  dis- 
played to  the  subject  as  the  momentary  height  of  the  graph.  The  range  of  height 
covered  1 . 5 standard  units  above  and  below  the  mean . For  tracking  the  bar 
height  was  updated  every  second , and  for  digit  processing , after  every 
response , 


RESULT3 


A separate  analysis  of  variance  (ANOVA)  was  conducted  for  each  depen- 
dent measure  discussed  below . The  ilrst  day  of  testing  was  treated  as  a learning 
session . Data  from  the  second  day  of  testing  are  analyzed  below . Presentation 
and  response  modality  for  the  absoluto  difference  task  and  task  loading  (single- 
vs . dual  -task  conditions)  were  the  independent  variables  in  this  analysis . 
Tracking  data  from  two  subjects  in  the  auditory  stimulus  presentation  (VTX) 
condition  were  lost  due  to  experimenter  error . Data  from  one  subject  in  the 
CRT  presentation  condition  indicated  that  he  failed  to  learn  the  digit  task . and 
his  data  wore  excluded  from  the  analysis.  One  additional  subject  was  randomly 
discarded  from  the  CRT  presentation  group . Data  from  a total  of  16  subjects 
were  analyzed,  eight  subjects  in  each  group. 

TRACKING  PERFORMANCE 

A digital  estimate  of  root  mean  square  tracking  error  (RMSE)  was 
computed,  based  on  the  immediately  preceding  10*second  of  absolUw*t  error 
measured  every  SGCi  msec . Values  of  RMSE  were  reco vded  every  5 seconds 
during  the  3-minute  trials.  The  mean  of  these  values  for  the  final  twc  minutes 
of  each  trial  represented  tracking  performance . 

A graph  of  RMSE  as  a function  of  the  experimental  conditions  is  presented 
in  Figurs  4 . The  ANOVA  summary  for  these  scores  is  shown  in  Table  n . A 
Tukey  post  hoc  analysis  revealed  that  tracking  performance  was  reliably 
superior  in  the  vocal  condition  compared  to  the  manual  keyboard  response 
condition.  Single-task  tracking  was  superior  to  both  dual -task  conditions. 

There  was  no  reliable  effect  of  presentation  modality  or  interaction  between 
presentation  ana  x c*  ponse  modes . 


Table  n 

Analysis,  of  Variance  for  RMS  Tracking  Error 
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1 
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.843 

.623 
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Two  riapMulmt  niMsurM  raprMMitwl  abnluta  difformie*  task 
partornanoa:  1)  Avarafa  Corraot  Raaponaa  Latanoy  (ACRL);  and  2)  Paroaot 
Corraot  Trlala  (FCT) . ACRL  waa  oonputad  by  avaraglng  latanolaa  oi  all 
raapooaaa  brom  trlala  In  vhloh  no  arror  or  ayaton  raoognltlon  tellura  ocourrad . 
FCT  woa  oomputad  by  dividing  tha  total  numbar  of  arrorlaaa  trlala  by  tha  total 
nttttbar  of  trlalci.  asoludlng  trlala  oontalnlng  raoognltloa  falluraa  In  tha  voloa 
ooodltlon.  Thraa  typaa  of  arrora  oould  ooour  In  tha  dlgH-proonaalng  data: 
arrora  owda  by  tha  aubjaot.  ailaraoognltlon  of  a oorraot  roaponaa  by  tha  VDIT8 . 
and  ftdlura  of  (ha  VDIT8  to  raoognlia  a oorraot  raaponaa  aa  a maniMr  of  tha 
taak  vocabulary . TharaCora,  tha  danomlnator  in  tha  FCT  aoora  oould  contain  an 
unknown  nuabar  of  miaraoognltlono. 


Tha  BMana  of  tha  two  dapaodant  maauraa  daaorlblng  dlglt-prooaaalng 
partonaanoa  ara  aunuaarlxad  la  Plguraa  S and  8.  Tha  analyala  of  varlanoa 
auanarloa  for  thaaa  data  ara  praaantad  In  Tablaa  m and  IV' . Tha  vlaual  pra- 
aantatlon  ooadltioaa  produced  rallably  auparlor  parfbnaaaoa  over  tha  auditory 
oondltloaa  for  ACRL . FCT  waa  not  rallahly  affaotad  by  thla  faotor . Raaponaa 
modality  did  not  afteot  altbar  of  the  two  maaauraa.  Howavar,  tha  Intaraotlon  of 
praaanudon  and  raaponaa  modaa  waa  rallabla  for  both  aooraa . 

Table  m 

Analyala  of  Varlanoa  for  Avaraga  Corraot  Raaponaa  Latanoy 


Jatwaan  8ubJaota 

A - 8tlmttlua  Moda 
8ub).  w.  groupa 


12.520 

.387 


34.138 


Within  8ub]aota 


B - Roaponaa  Moda 
C - Taak  Load 
A X B 
A X C 
B X C 
A X B X C 

B X 8ub).  w.  groupa 
C X SubJ.  w.  groupa 
BC  X SubJ.  'y  groupa 


0.000 

.223 

1.545 

.057 

.018 

0.000 

.058 

.037 

.023 


0.000 

8.055 

20.320 

1.543 

.533 

.011 


Table  IV 


Analyeie  of  Variance  for  Percent  Correct  Triale 


SOURCE 

df 

MS 

F 

P 

Between  Subleots 

IS 

A - Stimulus  Mode 

1 

.001 

.093 

.781 

SubJ.  w.  groups 

14 

.010 

Within  Subjects 

48 

B Response  Mode 

1 

0.000 

.040 

.837 

C - Task  Load 

1 

.028 

9.812 

.008 

A X B 

1 

.098 

11.145 

.005 

A X C 

1 

.004 

1.387 

.258 

B X C 

1 

0.000 

.249 

.830 

A X B X C 

1 

0.000 

.038 

.842 

B X SubJ . w . groups 

14 

.008 

C X SubJ.  w.  groups 

14 

.003 

BC  X SubJ.  w.  groups 

14 

.003 

The  main  effect  of  taak  load  was  reliable  for  both  performance  ecorea . 
Dual'taak  oonditiona  produced  longer  response  times  and  higher  error  rates 
than  aingle-task  trials . 


DISCUSSION 

The  purpose  of  this  experiment  was  to  determine  whether  a speech  under- 
standing system  would  provide  a parallel  channel  for  the  performance  of  an 
information  processing  task  concurrently  with  a continuous  visual/manual 
control  taak.  The  question  can  be  restated  in  two  parts;  1)  What  combination 
of  input  and  output  (I/O)  channels  for  the  discrete  information  processing  task 
provides  optimum  information  transmission  for  both  tasks?  2)  Is  this  optimum 
equivalent  to  single-taok  performance? 

The  results  indicate  that  both  tracking  and  digit-processing  performance 
deteriorated  in  dual -taak  conditions.  In  answer  to  the  second  question  above, 
no  combination  of  I/O  channels  resulted  in  dual -task  performance  equivalent  to 
single-task  performance.  The  assumption  that  a speech  understanding  system 
provides  a completely  parallel  channel  is  apparently  unwarranted  in  this  case . 

To  answer  the  first  question . the  data  presented  in  Figures  4,5,  and  6 
were  redrawn  In  Figures  7 and  8 . Because  the  priorities  between  the  tasks 
were  held  constant  and  equal , it  seems  reasonable  to  interpret  each  of  the  points 


plotted  in  Figures  7 and  8 as  an  estimate  of  the  center  point  of  each  of  four 
performance  operating  charaoteristica . The  remaining  points  for  each  curve 
could  theoretically  be  generated  by  ropeating  the  experiment  with  priorities  set 
for  different  tradeoffs.  Until  the  shape  of  the  POC  for  each  combination  of  tasks 
is  known,  the  conclusions  discussed  below  concerning  the  relative  merits  of 
various  combinations  of  I/O  channels  for  the  digit  task  must  be  regarded  as 
tentative . The  data  in  Figures  7 and  6 suggest  that  the  conjoint  performance  of 
the  tasks  is  maximised  when  CRT  presentation  and  voice  response  are  employed 
in  the  digit  task . Other  combinations  of  I/O  channels  for  the  digit  task  resulted 
in  different  points  the  joint  performance  tradeoff  space . 

In  prac  Jcal  military  aviation  systems  the  placement  and  organisation  of 
keyboards  are  usually  not  optimal , and  visual  displays  are  cluttered  , while 
voice  I/O  channels  are  relatively  unused . The  format  of  the  stimulus  display 
and  the  layout  of  the  keyboard  and  joystick  in  this  experiment  wore  selected  to 
provide  high  S-R  compatibility  and  to  minimise  structural  interference  between 
the  time-shared  tasks.  Any  alterations  in  the  stimulus  format  or  keyboard/joy- 
stlok  relationships  would  probably  have  either  no  effect . or  would  be  detrimental 
to  performance  of  the  tasks . A significant  improvement  in  performance  seems 
unlikely . Therefore . the  relative  advantage  of  the  voice  reeponoe  channel  over 
the  keyboard  channel  that  is  apparent  in  Figures  7 and  8 may  be  expected  to 
increase  in  a practical  system . However , the  location  of  points  in  the  joint 
performance  space  should  be  explored  for  tasks  that  are  more  representative  of 
the  physical  and  information  processing  characteristics  of  a specific  system . 

The  latency  data  (ACRL)  for  the  digit«prooesaing  task  represented  the 
elapsed  time  from  the  onset  of  the  stimulus  to  the  termination  of  a correct 
reeponae . In  the  voice  response  mode  this  time  included  an  average  527  msec 
required  by  the  VDBTS  to  aooomplish  utterance  recognition . It  is  expected  that 
recognition  latency  will  be  reduced  to  very  nearly  zero  in  speech  understanding 
ayatems  that  are  currently  under  development.  An  estimate  of  the  Improvement 
in  man-machine  system  performance  that  such  a development  would  afford  can 
be  obtained  by  subtracting  0.527  sec  from  the  ACRL  data  for  the  voice  conditions. 
This  estimate  considerably  enhances  the  apparent  advantage  for  the  voice 
channel . 

Because  the  voice  recognition  system  could  misinterpret  subject  responses , 
perhaps  confusing  the  digits  five  and  nine  as  might  a human  listener , the  error 
soore  (PCT)  for  the  digit -processing  task  reflected  both  subject  performance  and 
system  performance . The  effect  of  the  task  loading  variable  on  error  data  may 
have  been  due  to  unknown  variations  in  the  speech  signal  spectrum  as  a function 
of  procesaing  load  imposed  on  the  subject . resulting  in  increased  misrecog  - 
nitions  by  the  system . However , the  effect  of  processing  load  was  also  observed 
in  the  latency  data.  System  recognition  latency  is  a function  of  vocabulary  site, 
syntax  structure , and  length  of  the  utterance . Vocabulary  size  snd  syntax  for 
the  digit  task  were  fixed;  any  variations  in  utterance  duration  were  due  to  the 
subject.  The  effect  of  processing  load  on  the  latency  data  supports  the  conclusion 
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that  tha  error  data  reflect  errors  due  to  thi  subject  rather  than  system 
rocognition  failures . Future  experiments  should  include  procedures  for  con- 
firming off-line  that  the  syatem's  recognition  accuracy  is  constant  across  experi- 
mental conditions . 


A short-coming  in  the  design  cf  the  present  experiment  was  that  feedback 
to  the  subject  was  a function  of  correct  response  latency  only . Subjects  could 
maximise  the  height  of  the  performance  feedback  indicator  in  the  digit  taak  by 
generating  very  fast,  but  frequently  inaccurate,  responses.  The  problem  of 
interpreting  performance  data  when  the  subjects'  speed -accuracy  tradeoff 
strategy  haa  not  been  oontrolled  is  evident  from  the  data  for  the  keyboard 
reaponse  mode  in  Figures  7 and  8.  As  shown  in  the  Figures,  the  CRT -keyboard 
combination  of  I/O  channels  resulted  in  faster  average  correct  response  latencios 
than  the  VTX-keyboard  condition,  but  accuracy  of  responses  in  the  CRT- 
key  board  condition  was  lower  than  in  the  VTX-keyboard  condition.  Conjoint 
tracking  performance  for  the  two  conditions  was  not  reliably  different . It 
appears  that  subjects  in  the  CRT  group  assumed  a response  strategy  that 
emphasised  speed,  whereas  the  VTX  group  emphasised  accuracy.  To  Investigate 
this  question  an  analysis  of  variance  was  performed , comparing  the  averagn 
latency  of  correct  and  incorrect  keyboard  responses  for  the  two  groups.  The 
results  of  the  ANOV  A are  summarised  in  Table  V . 


Table  V 


Analysis  of  Variance  for  Response  Latencies 


SOURCE 


Between  Subjects 

A - Stimulus  Mode 
Subj.  w.  groups 

Within  Subjects 

B - Response  Type* 
A X B 

Subj . w . groups 


1.044 

1.034 

.127 


8.241 

8.155 


* Correct  vs . Incorrect  Responses . 


If  subjects  were  trading  speed  for  accuracy,  incorrect  responses  should, 
on  the  average,  have  been  faster  than  correct  responses.  The  results  indicated 
that  incorrect  responses  were  slower  than  corrects  for  the  CRT  group . There 
was  no  difference  between  corrects  and  incorrects  for  tha  VTX  group.  If  the 
assumption  is  made  that  subjects  attempted  to  maintain  a consistent  speed - 
accuracy  strategy  in  v I experimental  conditions,  the  results  suggest  that  the 
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two  meaourM  of  porformtnce  in  the  digit  task  (ACRL  and  PCT)  reflect  proceaaea 
that  are  dlfferenttaUy  affected  by  atiinulua  preaentation  and  reaponae  modea . In 
future  experinente  an  attempt  ehould  be  made  to  control  the  aubjecta*  speed-* 
accuracy  tradeoff  atrategy , perhaps  by  computing  feedback  as  a joint  function  of 
speed  and  accuracy , and  varying  the  contribution  of  each  to  the  level  of  a 
performance  feedback  indicator . A related  experimental  question  concerns 
whether  the  results  reported  in  this  paper  are  representative  of  human 
performance  in  taak^  that  lack  graphical  indicators  of  performance.  Future 
experiments  should  also  include  conditions  in  which  graphical  performance 
in^oators  are  not  available  to  the  subject . in  order  to  assess  the  effects  of  the 
feedback  display  on  the  shapes  of  the  performance  operating  characteristics  for 
the  tasks . 

The  interaction  between  stimulus  presentation  mode  and  response  mode 
for  both  accuracy  and  latency  scores  in  the  digit-proceaslng  task  indicated 
that  for  auditory  inputs , the  keyboard  response  m^e  was  superior . and  for 
visual  inputs . the  voice  response  mode  resulted  in  better  performance . In 
the  auditory  Input,  vocal  output  condition,  the  acoustical  attributes  of  both 
stimulus  and  reaponae  may  have  been  a unique  source  of  interference  that 
contributed  to  reduced  performance  efficiency.  It  seems  especially  likely  that 
rehearsal  and  retrieval  processes  active  during  digit-task  performance  ware 
more  susceptible  to  disruption  by  intervening  vocal  reaix^nses  than  by  manual 
responses . The  results  demonstrated  that  the  peculiar  information  processing 
requirements  of  a task  must  be  taken  into  account  when  specifying  the  I/O 
channel  structure  for  optimum  task  performance . 

CONCLUSIONS 

The  resulta  indicated  that  the  combination  of  CRT  stimulus  mode  and 
voice  reeponee  mode  provided  optimum  joint-task  performance.  No  combination 
of  I/O  channels  resulted  in  equivalent  single-  and  dual-task  performance. 

Future  experiments  should  be  designed  to  investigate  the  joint-performance 
space  for  tasks  that  are  more  representative  of  the  information  processing 
performance  requirements  of  specific  systems . However . the  interpretablllty 
of  the  results  of  such  research  will  depend  upon  the  solution  of  methodological 
problems , such  as  how  to  control  or  account  for  the  subjects'  speed -accuracy 
tradeoff  strategiee  and  the  priorities  they  place  upon  the  concurrent  tasks 
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